自动语音识别(ASR)是新服务的关键元素,可帮助用户与自动化系统进行交互。深度学习方法使得用单词错误率低于5%的英语ASR部署系统成为可能。但是,这些方法的使用仅适用于具有数百或数千小时音频及其相应转录的语言。为了使所谓的低资源语言加快可以改善其ASR系统性能的资源的可用性,正在研究基于现有的资源来创建新资源的方法。在本文中,我们描述了我们的数据增强方法,以改善低资源和凝集性语言的ASR模型的结果。我们使用Wav2letter ++模型进行了为Quechua开发ASR的实验。通过我们的基本模型方法,我们将WER降低了8.73%。由此产生的ASR模型获得了22.75%的WER,并接受了99小时的原始资源和99小时的合成数据的培训,并结合了文本增强和合成语音发电
translated by 谷歌翻译
Huqariq语料库是秘鲁本地语言的多语言集合。转录后的语料库旨在研究和开发语音技术,以保护秘鲁的濒危语言。Huqariq主要设计用于开发自动语音识别,语言识别和文本到语音工具。为了可持续获得语料库收集,我们采用众包方法。Huqariq包括秘鲁的四种母语,预计到2022年底,秘鲁的48种母语中最多可以达到20种母语。该语料库有500多名志愿者记录的220个小时的转录音频,使其成为秘鲁母语最大的语料库。为了验证语料库的质量,我们使用220小时的完全转录音频提出语音识别实验。
translated by 谷歌翻译
Vehicle-to-Everything (V2X) communication has been proposed as a potential solution to improve the robustness and safety of autonomous vehicles by improving coordination and removing the barrier of non-line-of-sight sensing. Cooperative Vehicle Safety (CVS) applications are tightly dependent on the reliability of the underneath data system, which can suffer from loss of information due to the inherent issues of their different components, such as sensors failures or the poor performance of V2X technologies under dense communication channel load. Particularly, information loss affects the target classification module and, subsequently, the safety application performance. To enable reliable and robust CVS systems that mitigate the effect of information loss, we proposed a Context-Aware Target Classification (CA-TC) module coupled with a hybrid learning-based predictive modeling technique for CVS systems. The CA-TC consists of two modules: A Context-Aware Map (CAM), and a Hybrid Gaussian Process (HGP) prediction system. Consequently, the vehicle safety applications use the information from the CA-TC, making them more robust and reliable. The CAM leverages vehicles path history, road geometry, tracking, and prediction; and the HGP is utilized to provide accurate vehicles' trajectory predictions to compensate for data loss (due to communication congestion) or sensor measurements' inaccuracies. Based on offline real-world data, we learn a finite bank of driver models that represent the joint dynamics of the vehicle and the drivers' behavior. We combine offline training and online model updates with on-the-fly forecasting to account for new possible driver behaviors. Finally, our framework is validated using simulation and realistic driving scenarios to confirm its potential in enhancing the robustness and reliability of CVS systems.
translated by 谷歌翻译
Are extralinguistic signals such as image pixels crucial for inducing constituency grammars? While past work has shown substantial gains from multimodal cues, we investigate whether such gains persist in the presence of rich information from large language models (LLMs). We find that our approach, LLM-based C-PCFG (LC-PCFG), outperforms previous multi-modal methods on the task of unsupervised constituency parsing, achieving state-of-the-art performance on a variety of datasets. Moreover, LC-PCFG results in an over 50% reduction in parameter count, and speedups in training time of 1.7x for image-aided models and more than 5x for video-aided models, respectively. These results challenge the notion that extralinguistic signals such as image pixels are needed for unsupervised grammar induction, and point to the need for better text-only baselines in evaluating the need of multi-modality for the task.
translated by 谷歌翻译
As demand for large corpora increases with the size of current state-of-the-art language models, using web data as the main part of the pre-training corpus for these models has become a ubiquitous practice. This, in turn, has introduced an important challenge for NLP practitioners, as they are now confronted with the task of developing highly optimized models and pipelines for pre-processing large quantities of textual data, which implies, effectively classifying and filtering multilingual, heterogeneous and noisy data, at web scale. One of the main components of this pre-processing step for the pre-training corpora of large language models, is the removal of adult and harmful content. In this paper we explore different methods for detecting adult and harmful of content in multilingual heterogeneous web data. We first show how traditional methods in harmful content detection, that seemingly perform quite well in small and specialized datasets quickly break down when confronted with heterogeneous noisy web data. We then resort to using a perplexity based approach but with a twist: Instead of using a so-called "clean" corpus to train a small language model and then use perplexity so select the documents with low perplexity, i.e., the documents that resemble this so-called "clean" corpus the most. We train solely with adult and harmful textual data, and then select the documents having a perplexity value above a given threshold. This approach will virtually cluster our documents into two distinct groups, which will greatly facilitate the choice of the threshold for the perplexity and will also allow us to obtain higher precision than with the traditional classification methods for detecting adult and harmful content.
translated by 谷歌翻译
在这项工作中,我们利用神经网络(NNS)的通用近似特性来设计端口 - Hamiltonian(pH)框架中的完全致动机械系统的互连和阻尼分配(IDA)基于控制(PBC)方案。为此,我们将IDA-PBC方法转换为解决部分差分匹配方程的监督学习问题,并满足均衡分配和Lyapunov稳定条件。这是主要的结果,即学习算法的输出在被动和Lyapunov稳定性方面具有明确的控制理论解释。通过数值模拟验证了所提出的控制设计方法,用于1和两度自由度的机械系统。
translated by 谷歌翻译
增加分类变量的基数可能会降低ML算法的整体性能。本文介绍了一种新颖的计算预处理方法,用于转换为机器学习(ML)算法的数值变量的分类。在此方法中,我们选择并将三个分类特征转换为数值特征。首先,我们根据变量中类别的分布选择阈值参数。然后,我们使用条件概率将每个分类变量转换为两个新的数字变量,总共产生六个新的数变量。之后,我们将这六个数值送入到主成分分析(PCA)算法。接下来,我们选择主组件(PCS)的整个或部分数量。最后,通过使用十种不同的分类器应用二进制分类,我们测量了新编码器的性能,并将其与其他17个众所周知的类别编码器进行比较。所提出的技术实现了使用众所周知的网络安全NSLKDD DataSet对高基数分类变量下的曲线(AUC)下的最高性能。此外,我们定义了谐波平均指标,在火车和测试性能之间找到最佳权衡,并防止磨损和过度装备。最终,新创建的数字变量的数量很少。因此,该数据减少改善了计算处理时间,这可能减少5G未来电信网络中的处理数据。
translated by 谷歌翻译
培训可以在各种城市和公路情景中自主推动的智能代理在过去几十年中是机器人学会的热门话题。然而,在道路拓扑和邻近车辆定位方面的驾驶环境的多样性使得这个问题非常具有挑战性。不言而喻,虽然自动驾驶的场景特定的驾驶政策是有前途的,并且可以提高运输安全性和效率,但它们显然不是一个通用的可扩展解决方案。相反,我们寻求决策计划和驾驶策略,可以概括为新颖和看不见的环境。在这项工作中,我们利用了人类司机学习其周围环境的抽象表达的关键思想,这在各种驾驶场景和环境中相当类似。通过这些陈述,人类司机能够快速适应新颖的环境和在看不见的条件下驱动。正式地,通过强制信息瓶颈,我们提取一个潜在的表示,最小化\ extentit {距离} - 我们介绍的量化,以便在驱动场景之间介绍不同驾驶配置之间的相似性。然后采用这种潜在的空间作为Q学习模块的输入,以学习更广泛的驾驶策略。我们的实验表明,使用这种潜在的表示可以将崩溃的数量减少到大约一半。
translated by 谷歌翻译